Search CORE

Forty years of The Selfish Gene are not enough

Author: AR Mushegian
C Pál
ES Lander
I Yanai
Itai Yanai
KE Nelson
Martin J. Lercher
RD Fleischmann
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Comparing biological networks via graph compression

Author: A Kocsor
AR Mushegian
BP Kelley
DJ Cook
H Morgan
H Ogata
J Yang
L Peshkin
M Adler
M Hayashida
M Kanehisa
M Li
M Zaslavskiy
Morihiro Hayashida
N Krasnogor
R Singh
RY Pinter
S Wernicke
T Ito
Tatsuya Akutsu
Y Tohsato
Z Li
Z Liang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Comparison of various kinds of biological data is one of the main problems in bioinformatics and systems biology. Data compression methods have been applied to comparison of large sequence data and protein structure data. Since it is still difficult to compare global structures of large biological networks, it is reasonable to try to apply data compression methods to comparison of biological networks. In existing compression methods, the uniqueness of compression results is not guaranteed because there is some ambiguity in selection of overlapping edges. Results This paper proposes novel efficient methods, CompressEdge and CompressVertices, for comparing large biological networks. In the proposed methods, an original network structure is compressed by iteratively contracting identical edges and sets of connected edges. Then, the similarity of two networks is measured by a compression ratio of the concatenated networks. The proposed methods are applied to comparison of metabolic networks of several organisms, <it>H. sapiens, M. musculus, A. thaliana, D. melanogaster, C. elegans, E. coli, S. cerevisiae,</it> and <it>B. subtilis,</it> and are compared with an existing method. These results suggest that our methods can efficiently measure the similarities between metabolic networks. Conclusions Our proposed algorithms, which compress node-labeled networks, are useful for measuring the similarity of large biological networks.</p

Kyoto University Research Information Repository

Identification and characterisation of tomato torrado virus, a new plant picorna-like virus from tomato

Author: A Shevchenko
A. M. Dullemans
AE Gorbalenya
AR Mushegian
C Li
D James
D James
EV Koonin
G Nyland
J. F. J. M. van den Heuvel
JD Thompson
JF Bazan
JR Thompson
M Volpicella
M. Verbeek
P Argos
P. C. Maris
R. A. A. van der Vlugt
RAA van der Vlugt
RD Page
T Candresse
UK Laemmli
Publication venue: Springer-Verlag
Publication date: 01/01/2007
Field of study

A new virus was isolated from tomato plants from the Murcia region in Spain which showed symptoms of ‘torrado disease’ very distinct necrotic, almost burn-like symptoms on leaves of infected plants. The virus particles are isometric with a diameter of approximately 28 nm. The viral genome consists of two (+)ssRNA molecules of 7793 (RNA1) and 5389 nts (RNA2). RNA1 contains one open reading frame (ORF) encoding a predicted polyprotein of 241 kDa that shows conserved regions with motifs typical for a protease-cofactor, a helicase, a protease and an RNA-dependent RNA polymerase. RNA2 contains two, partially overlapping ORFs potentially encoding proteins of 20 and 134 kDa. These viral RNAs are encapsidated by three proteins with estimated sizes of 35, 26 and 23 kDa. Direct protein sequencing mapped these coat proteins to ORF2 on RNA2. Phylogenetic analyses of nucleotide and derived amino acid sequences showed that the virus is related to but distinct from viruses belonging to the genera Sequivirus, Sadwavirus and Cheravirus. This new virus, for which the name tomato torrado virus is proposed, most likely represents a member of a new plant virus genus

Repository for Publications and Research Data

Algorithm of OMA for large-scale orthology inference

Author: A Alexeyenko
A Bateman
A Schneider
AC Berglund-Sonnhammer
AK Bjorklund
Alexander CJ Roth
AM Altenhoff
AR Mushegian
C Dessimoz
C Dessimoz
C Dessimoz
CEV Storm
Christophe Dessimoz
CM Zmasek
D Fulton
DA Benson
DP Wall
ELL Sonnhammer
Gaston H Gonnet
K Chen
L Jensen
L Li
M Dayhoff
M Farrar
M Gil
M Remm
P Flicek
R Balasubramanian
RA Notebaart
RL Tatusov
RL Tatusov
RTJMvan der Heijden
TF DeLuca
TF Smith
WM Fitch
Publication venue: BioMed Central
Publication date: 01/12/2008
Field of study

Since the publication of our article (Roth, Gonnet, and Dessimoz: BMC Bioinformatics 2008 9: 518), we have noticed several errors, which we correct in the following

Repositório Institucional da Universidade de Brasília

UCL Discovery

Differential metabolism of Mycoplasma species as revealed by their genomes

Author: Altschul SF
Andrea Q. Maranhão
Angata T
Begley TP
Bradbury JM
Chaturvedi V
Chaturvedi V
Cho MK
Doman-Pytka M
Eze MO
Fabricio B.M. Arraes
Fábio O. Pedrosa
Glass JI
Hoffmann GE
Levine RL
Marcelo M. Brígido
Maria José A. de Carvalho
Maria Sueli S. Felipe
Meinnel T
Morowitz HJ
Morris VK
Mushegian AR
Oshima K
Pei D
Peterson SN
Pitkänen JP
Razin S
Razin S
Rittmann D
San Mateo LR
Tauber AI
Varki A
Vasconcelos AT
Vestweber D
Zimmer C
Publication venue: 'FapUNIFESP (SciELO)'
Publication date: 01/01/2007
Field of study

The annotation and comparative analyses of the genomes of Mycoplasma synoviae and Mycoplasma hyopneumonie, as well as of other Mollicutes (a group of bacteria devoid of a rigid cell wall), has set the grounds for a global understanding of their metabolism and infection mechanisms. According to the annotation data, M. synoviae and M. hyopneumoniae are able to perform glycolytic metabolism, but do not possess the enzymatic machinery for citrate and glyoxylate cycles, gluconeogenesis and the pentose phosphate pathway. Both can synthesize ATP by lactic fermentation, but only M. synoviae can convert acetaldehyde to acetate. Also, our genome analysis revealed that M. synoviae and M. hyopneumoniae are not expected to synthesize polysaccharides, but they can take up a variety of carbohydrates via the phosphoenolpyruvate-dependent phosphotransferase system (PEP-PTS). Our data showed that these two organisms are unable to synthesize purine and pyrimidine de novo, since they only possess the sequences which encode salvage pathway enzymes. Comparative analyses of M. synoviae and M. hyopneumoniae with other Mollicutes have revealed differential genes in the former two genomes coding for enzymes that participate in carbohydrate, amino acid and nucleotide metabolism and host-pathogen interaction. The identification of these metabolic pathways will provide a better understanding of the biology and pathogenicity of these organisms

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

arXiv.org e-Print Archive

Genome Sizes and the Benford Distribution

Author: A Berger
A Fernández
A Rényi
AR Mushegian
B Alberts
CE Shannon
CE Shannon
D Hartl
DA Beard
EH Davidson
F Benford
HP Yockey
James L. Friar
JD Watson
JI Glass
JS Mattick
Juan Pérez–Mercader
M Lynch
M Lynch
M Wang
M Wang
Matthew E. Hudson
MW McCoy
N Goldenfeld
R Gil
R Phillips
R Pinkham
RJ Taft
RW Hamming
RW Hamming
SE Ahnert
SN Peterson
SN Peterson
TA Brown
Terrance Goldman
TM Cover
TP Hill
TP Hill
Publication venue: Public Library of Science
Publication date: 18/05/2012
Field of study

BACKGROUND: Data on the number of Open Reading Frames (ORFs) coded by genomes from the 3 domains of Life show the presence of some notable general features. These include essential differences between the Prokaryotes and Eukaryotes, with the number of ORFs growing linearly with total genome size for the former, but only logarithmically for the latter. RESULTS: Simply by assuming that the (protein) coding and non-coding fractions of the genome must have different dynamics and that the non-coding fraction must be particularly versatile and therefore be controlled by a variety of (unspecified) probability distribution functions (pdf's), we are able to predict that the number of ORFs for Eukaryotes follows a Benford distribution and must therefore have a specific logarithmic form. Using the data for the 1000+ genomes available to us in early 2010, we find that the Benford distribution provides excellent fits to the data over several orders of magnitude. CONCLUSIONS: In its linear regime the Benford distribution produces excellent fits to the Prokaryote data, while the full non-linear form of the distribution similarly provides an excellent fit to the Eukaryote data. Furthermore, in their region of overlap the salient features are statistically congruent. This allows us to interpret the difference between Prokaryotes and Eukaryotes as the manifestation of the increased demand in the biological functions required for the larger Eukaryotes, to estimate some minimal genome sizes, and to predict a maximal Prokaryote genome size on the order of 8-12 megabasepairs. These results naturally allow a mathematical interpretation in terms of maximal entropy and, therefore, most efficient information transmission

Public Library of Science (PLOS)

FigShare

OrthoSelect: a protocol for selecting orthologous groups in phylogenomics

Author: A Dress
A Subramanian
AG Hatzigeorgiou
AJ Enright
AR Mushegian
AR Subramanian
B Misof
B Morgenstern
Burkhard Morgenstern
C Dessimoz
C Lottaz
C Notredame
C Zmasek
CB Do
CW Dunn
Dirk Erpenbeck
E Birney
E Sonnhammer
EV Koonin
F Chen
F Delsuc
F Delsuc
F Schreiber
Fabian Schreiber
Gert Wörheide
H Gee
H Philippe
J Castresana
J Ruan
J Wasmuth
J Wiens
JA Eisen
JE Stajich
JGB Changhui Yan
K Dolinski
K Katoh
K Katoh
Kerstin Pick
KP O'Brien
L Duret
L Li
M Schmollinger
O Poirot
R Chenna
R Durbin
R Edgar
R Tatusov
RC Edgar
S Altschul
SJ Bourlat
SR Eddy
T Gentzsch
T Tatusova
WM Fitch
Y Fukunishi
Y Zhou
Z Zhang
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Background: Phylogenetic studies using expressed sequence tags (EST) are becoming a standard approach to answer evolutionary questions. Such studies are usually based on large sets of newly generated, unannotated, and error-prone EST sequences from different species. A first crucial step in EST-based phylogeny reconstruction is to identify groups of orthologous sequences. From these data sets, appropriate target genes are selected, and redundant sequences are eliminated to obtain suitable sequence sets as input data for tree-reconstruction software. Generating such data sets manually can be very time consuming. Thus, software tools are needed that carry out these steps automatically. Results: We developed a flexible and user-friendly software pipeline, running on desktop machines or computer clusters, that constructs data sets for phylogenomic analyses. It automatically searches assembled EST sequences against databases of orthologous groups (OG), assigns ESTs to these predefined OGs, translates the sequences into proteins, eliminates redundant sequences assigned to the same OG, creates multiple sequence alignments of identified orthologous sequences and offers the possibility to further process this alignment in a last step by excluding potentially homoplastic sites and selecting sufficiently conserved parts. Our software pipeline can be used as it is, but it can also be adapted by integrating additional external programs. This makes the pipeline useful for non-bioinformaticians as well as to bioinformatic experts. The software pipeline is especially designed for ESTs, but it can also handle protein sequences. Conclusion: OrthoSelect is a tool that produces orthologous gene alignments from assembled ESTs. Our tests show that OrthoSelect detects orthologs in EST libraries with high accuracy. In the absence of a gold standard for orthology prediction, we compared predictions by OrthoSelect to a manually created and published phylogenomic data set. Our tool was not only able to rebuild the data set with a specificity of 98%, but it detected four percent more orthologous sequences. Furthermore, the results OrthoSelect produces are in absolut agreement with the results of other programs, but our tool offers a significant speedup and additional functionality, e.g. handling of ESTs, computing sequence alignments, and refining them. To our knowledge, there is currently no fully automated and freely available tool for this purpose. Thus, OrthoSelect is a valuable tool for researchers in the field of phylogenomics who deal with large quantities of EST sequences. OrthoSelect is written in Perl and runs on Linux/Mac OS X

Public Library of Science (PLOS)

Open Access LMU

Core Proteome of the Minimal Cell: Comparative Proteomics of Three Mollicute Species

Author: A Shevchenko
A Toledo-Arana
AC Forster
AR Mushegian
C Lartigue
CA Hutchison III
CM Fraser
CM Sassetti
Dmitry G. Alexeev
E Pennisi
EV Koonin
EV Koonin
FJ Grundy
FM Commichau
G Fang
Gleb Y. Fisunov
IA Demina
Ilya G. Kondratov
Irina A. Demina
JA Eisen
JD Jaffe
JI Glass
JK Harris
K Kobayashi
M Beck
M Güell
Maria A. Galyamina
Marina V. Serebryakova
N Gupta
Nadezhda A. Zhukova
Nicolay A. Bazaleev
ON Jensen
R Jain
R Zhang
S Kühner
S Pereyre
S Rasmussen
SJ Callister
SN Peterson
T Baba
UK Laemmli
Vadim M. Govorun
Valentina G. Ladygina
Vladimir Brusic
Z Gitai
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Mollicutes (mycoplasmas) have been recognized as highly evolved prokaryotes with an extremely small genome size and very limited coding capacity. Thus, they may serve as a model of a ‘minimal cell’: a cell with the lowest possible number of genes yet capable of autonomous self-replication. We present the results of a comparative analysis of proteomes of three mycoplasma species: A. laidlawii, M. gallisepticum, and M. mobile. The core proteome components found in the three mycoplasma species are involved in fundamental cellular processes which are necessary for the free living of cells. They include replication, transcription, translation, and minimal metabolism. The members of the proteome core seem to be tightly interconnected with a number of interactions forming core interactome whether or not additional species-specific proteins are located on the periphery. We also obtained a genome core of the respective organisms and compared it with the proteome core. It was found that the genome core encodes 73 more proteins than the proteome core. Apart of proteins which may not be identified due to technical limitations, there are 24 proteins that seem to not be expressed under the optimal conditions

CiteSeerX

Public Library of Science (PLOS)

Enzymes Are Enriched in Bacterial Essential Genes

Author: AG Holman
AM Gustafson
AR Mushegian
BJ Akerley
C Lartigue
CM Sassetti
CT French
E Pennisi
EC Webb
EP Rocha
EV Koonin
Feng Gao
GC Langridge
H Jeong
IK Jordan
IM Keseler
J Deng
J Henkel
JI Glass
K Knuth
K Kobayashi
KSJY Ko
LA Gallagher
M Ashburner
M Kanehisa
M May
M Seringhaus
NM de S Cameron
NR Salama
NT Liberati
R Zhang
R Zhang
RA Forsyth
Randy Ren Zhang
RR Chaudhuri
S Gerdes
S Gotz
S Kumar
S Saha
SY Gerdes
T Baba
V de Berardinis
Vasu D. Appanna
W Huang da
Y Chen
Y Ji
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Essential genes, those indispensable for the survival of an organism, play a key role in the emerging field, synthetic biology. Characterization of functions encoded by essential genes not only has important practical implications, such as in identifying antibiotic drug targets, but can also enhance our understanding of basic biology, such as functions needed to support cellular life. Enzymes are critical for almost all cellular activities. However, essential genes have not been systematically examined from the aspect of enzymes and the chemical reactions that they catalyze. Here, by comprehensively analyzing essential genes in 14 bacterial genomes in which large-scale gene essentiality screens have been performed, we found that enzymes are enriched in essential genes. Essential enzymes have overrepresented ligases (especially those forming carbon-oxygen bonds and carbon-nitrogen bonds), nucleotidyltransferases and phosphotransferases, while have underrepresented oxidoreductases. Furthermore, essential enzymes tend to associate with more gene ontology domains. These results, from the aspect of chemical reactions, provide further insights into the understanding of functions needed to support natural cellular life, as well as synthetic cells, and provide additional parameters that can be integrated into gene essentiality prediction algorithms

CiteSeerX